Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

机译：优化Kepler GpU上卷积核的内存效率

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Convolution is a fundamental operation in many applications, such as computervision, natural language processing, image processing, etc. Recent successes ofconvolutional neural networks in various deep learning applications put evenhigher demand on fast convolution. The high computation throughput and memorybandwidth of graphics processing units (GPUs) make GPUs a natural choice foraccelerating convolution operations. However, maximally exploiting theavailable memory bandwidth of GPUs for convolution is a challenging task. Thispaper introduces a general model to address the mismatch between the memorybank width of GPUs and computation data width of threads. Based on this model,we develop two convolution kernels, one for the general case and the other fora special case with one input channel. By carefully optimizing memory accesspatterns and computation patterns, we design a communication-optimized kernelfor the special case and a communication-reduced kernel for the general case.Experimental data based on implementations on Kepler GPUs show that our kernelsachieve 5.16X and 35.5% average performance improvement over the latest cuDNNlibrary, for the special case and the general case, respectively.

机译：卷积是许多应用程序（例如计算机视觉，自然语言处理，图像处理等）中的基本操作。卷积神经网络在各种深度学习应用程序中的最新成功对快速卷积提出了更高的要求。图形处理单元（GPU）的高计算吞吐量和内存带宽使GPU成为加速卷积运算的自然选择。但是，最大程度地利用GPU的可用内存带宽进行卷积是一项艰巨的任务。本文介绍了一个通用模型来解决GPU的存储库宽度与线程的计算数据宽度之间的不匹配问题。基于此模型，我们开发了两个卷积核，一个用于一般情况，另一个用于具有一个输入通道的特殊情况。通过仔细优化内存访问模式和计算模式，我们针对特殊情况设计了通信优化的内核，针对一般情况设计了通信降低的内核。基于开普勒GPU的实验数据表明，我们的内核实现了5.16倍和35.5％的平均性能提升在最新的cuDNN库上，分别针对特殊情况和一般情况。

著录项

作者
Chen, Xiaoming; Chen, Jianxu; Chen, Danny Z.; Hu, Xiaobo Sharon;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs [J] . Mark J. Mawson, Alistair J. Revell Computer physics communications . 2014,第10期

机译：开普勒架构nVidia GPU上晶格Boltzmann求解器的内存传输优化
2. Optimizing GPU Energy Efficiency with 3D Die-Stacking Graphics Memory and Reconfigurable Memory Interface [J] . JISHEN ZHAO, GUANGYU SUN, GABRIEL H. LOH, ACM Transactions on Architecture and Code Optimization . 2013,第4期

机译：通过3D芯片堆叠图形内存和可重新配置的内存接口来优化GPU能源效率
3. Throughput and Power Efficiency Evaluation of Block Ciphers on Kepler and GCN GPUs Using Micro-Benchmark Analysis [J] . Naoki NISHIKAWA, Keisuke IWAI, Hidema TANAKA, IEICE transactions on information and systems . 2014,第6期

机译：使用微基准分析的开普和GCN GPU对块密码的吞吐量和功率效率评估
4. Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs [C] . Xiaoming Chen, Jianxu Chen, Danny Z. Chen, ACM/EDAC/IEEE Design Automation Conference . 2017

机译：优化Kepler GPU上的卷积内核的内存效率
5. On implementation and optimization of large-data scientific kernels on multicore processors and GPUs [D] . Hakeem, Mohammad Umar 2013

机译：在多核处理器和GPU上实现和优化大数据科学内核
6. DOPA: GPU-based protein alignment using database and memory access optimizations [O] . Laiq Hasan, Marijn Kentie, Zaid Al-Ars 2011

机译：DOPA：使用数据库和内存访问优化的基于GPU的蛋白质比对
7. Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs [O] . Li, Chao, Yang, Yi, Feng, Min, 2016

机译：优化深度卷积神经网络的存储效率图形处理器

Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅